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The multiplicative censoring model introduced in Vardi [Biomet- 
rika 76 (1989) 751-761] is an incomplete data problem whereby two in- 
dependent samples from the lifetime distribution G, X m = (Xi, . . . , X m ) 
and Z n = [Z\, . . . , Z n ), are observed subject to a form of coarsen- 
ing. Specifically, sample X m is fully observed while y n = (Yi, . . . ,Y n ) 
is observed instead of Z n , where Yi = UiZi and (Ui,...,U n ) is an 
independent sample from the standard uniform distribution. Vardi 
[Biometrika 76 (1989) 751-761] showed that this model unifies several 
important statistical problems, such as the deconvolution of an expo- 
nential random variable, estimation under a decreasing density con- 
straint and an estimation problem in renewal processes. In this paper, 
we establish the large-sample properties of kernel density estimators 
under the multiplicative censoring model. We first construct a strong 
approximation for the process y/k(G — G), where G is a solution of 
the nonparametric score equation based on (X m ,y„), and k = m + n 
is the total sample size. Using this strong approximation and a result 
on the global modulus of continuity, we establish conditions for the 
strong uniform consistency of kernel density estimators. We also make 
use of this strong approximation to study the weak convergence and 
integrated squared error properties of these estimators. We conclude 
by extending our results to the setting of length-biased sampling. 

1. Introduction. Vardi [50] introduced an incomplete data problem uni- 
fying several statistical models. The problem consisted of inferring the life- 
time distribution of interest G through a random sample X\ , X2 , ■ ■ ■ , X m 
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drawn directly from G and a random sample Y\, Y2, . . . , Y n drawn from the 
distribution F with density function 



Since / is a decreasing density function, Y may be expressed as the product 
of two independent random variables: a nonnegative variate Z and a stan- 
dard uniform variate U. From the form of (1.1), it is easy to see that in this 
case Z must be distributed according to G. This representation suggests that 
only a random fraction of Z may be observed, motivating the nomenclature 
multiplicative censoring used to describe this incomplete data scheme. The 
likelihood based on the k = m + n observations X\ = x\, ■ ■ ■ , X m = x m and 
Y 1 =yi,...,Y n = y n is 



As discussed by Vardi [50], the multiplicative censoring model arises from 
the deconvolution of an exponential random variable, estimation under a de- 
creasing density constraint and an estimation problem in renewal processes. 
The literature on these and related problems is vast. Estimation under a de- 
creasing density constraint dates back to the seminal work of Grenander [22] , 
with key contributions by Groeneboom [23] and Huang and Wellner [26]. 
The estimation problem in renewal processes discussed in [50] is closely tied 
to important applications in cross-sectional sampling and prevalent cohort 
studies in epidemiology (length-biased sampling) and in labor force stud- 
ies in economics (stock sampling). The multiplicative censoring model and 
its variants have been studied by [6, 8, 25, 45, 50] and [51], among others. 
Vardi [51] studied the asymptotic behavior of solutions of the nonparametric 
score equation under the multiplicative censoring model. 

As will be discussed later, multiplicative censoring and left-truncated 
right-censored data are intricately tied. The latter have been extensively 
studied in the statistical literature. Their importance stems mainly, al- 
though not exclusively, from the widespread use of prevalent cohort study 
designs to estimate survival from onset of a disease. In such studies, patients 
with prevalent disease are identified at some instant in calendar time, often 
through a cross-sectional survey. These patients are then followed forward 
in time until death or loss to follow-up. If no temporal change in the inci- 
dence of disease has occurred during the period covering observed onsets, 
a stationary Poisson process may adequately describe the incidence pattern 
of the disease; see [2-4] and [53] . In this case, the left-truncation variable is 
uniformly distributed, and the failure time data are said to be length-biased. 



(1.1) 




(1.2) 




KDE UNDER MULTIPLICATIVE CENSORING 



3 



The likelihood for the observed data is then given by (1.2), where 




Jo 



flu = J °° udFu(u) and Fjj, the unbiased distribution, is the underlying dis- 
tribution function about which we would like to infer; see Section 6 and [3]. 
Because we require fijj < oo in the above, we restrict our attention to dis- 



The connection between the multiplicative censoring model and prevalent 
cohort studies under the stationarity assumption has revived interest in the 
former. Nonetheless, there appears to be no result in the literature on density 
estimation under the multiplicative censoring model, despite its importance 
in applied sciences. A recent application described by Kvam [28] concerns 
nanoscience and the measurement of carbon nanotubes. As discussed by 
Silverman [43], density estimation can be useful for purposes of data explo- 
ration and presentation. It is effective in the investigation of modes (determi- 
nation of multimodality and identification of modes) and tail behavior (rate 
of tail decay). These features are especially important in length-biased sam- 
pling and survival analysis, where skewness is often pervasive and differential 
subgroup characteristics may lead to multimodality. An additional motiva- 
tion for the study of density estimation under multiplicative censoring stems 
from the fact that nonparametric regression of right-censored length-biased 
data has not been addressed in the literature. In view of the intricate link 
between density estimation and nonparametric regression (see [35]), a study 
of density estimation under multiplicative censoring provides foundations 
for studying nonparametric regression of right-censored length-biased data. 

Among the various methods of density estimation, kernel smoothing is 
particularly appealing for both its simplicity and its interpret ability (e.g., 
as a limiting pointwise average of shifted histograms). It provides a unifying 
framework in that, as discussed in [40], each of finite difference density es- 
timation, smoothing by convolution, orthogonal series approximations and 
other smoothing methods historically used in the various applied sciences 
can be seen as instances of kernel smoothing. This article studies the large- 
sample properties of kernel density estimators in the setting of multiplica- 
tive censoring. Pioneered by Silverman [42], the approach adopted consists 
of constructing strong approximations of the empirical density process. 

Although under the multiplicative censoring model we may avoid com- 
plexities altogether by performing estimation using the uncensored observa- 
tions alone, use of the full data is motivated by at least two reasons. First, 
although discarding the censored cases under the canonical multiplicative 
censoring scheme does not compromise consistency, the same cannot be said 
under the related length-biased sampling scheme, even though these schemes 
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lead to the same likelihood. This occurs because, under length-bias sampling, 
the uncensored cases do not emanate directly from the (length-biased ver- 
sion of the) distribution of interest. Systematic exclusion of the censored 
cases would therefore lead to inconsistency. This fact motivates the study 
of both censored and uncensored cases under multiplicative censoring. Sec- 
ond, due to the informativeness of the censoring mechanism, ignoring the 
censored observations may lead to a substantial loss of efficiency Because 
the asymptotic covariance function of the nonparametric maximum likeli- 
hood estimator of G does not have an explicit form, this phenomenon is 
difficult to quantify in the nonparametric setting (see the discussion on page 
1024 of [51]); however, a parametric example may be illustrative. Suppose 
that the uncensored observations emanate from a Gamma distribution, say 
with mean 29 and variance 29 2 , then the censored observations are expo- 
nentially distributed with mean 9. The asymptotic relative efficiency of the 
full-sample MLE relative to the uncensored-sample MLE is 1 + v/2, where 
v > is the asymptotic relative frequency of censored observations to uncen- 
sored observations. If, for example, v = 1, indicating that uncensored and 
censored cases arise in equal numbers asymptotically, use of the full sample 
provides a fifty percent gain in efficiency. 

Following [27], hereafter referred to as KMT, and [15], we first construct 
a strong approximation for the process Vk(G-G), where G is a solution 
of the nonparametric score equation based on (X m ,y n ). The literature on 
strong approximations is vast. Recent reviews on empirical processes, strong 
approximations and the KMT construction include [17] and [30]. Using this 
strong approximation and a result on the global modulus of continuity, we 
obtain the strong uniform consistency of the kernel density estimators of the 
density function g associated to G and find a sequence of Gaussian processes 
strongly uniformly approximating the empirical kernel density process. Us- 
ing these results, we study the integrated squared error properties of the 
kernel density estimators. 

The layout of the paper is as follows. In Section 2, we introduce our 
notation and present some preliminaries. In Section 3, we find a sequence 
of Gaussian processes that strongly uniformly approximates the empirical 
process Vk(G — G) and study its global modulus of continuity. We use these 
results to study the asymptotic behavior of the kernel density estimators 
in Section 4. It is shown, in particular, that the kernel density estimators 
are strongly consistent and asymptotically Gaussian. Section 5 is devoted 
to the integrated squared error properties of the kernel density estimators 
and includes results from a preliminary small-sample simulation study. We 
show how our results can be extended to length-biased sampling with right- 
censoring in Section 6 and present concluding remarks in Section 7. The 
claim and theorems are proved in the Appendix while lemmas are proved in 
the supplementary material [1]. 
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2. Preliminaries. We consider the random multiplicative censoring model 
introduced in [50] , whereby two independent random samples X m = (Xi , . . . , 
X m ) and Z n = (Z±, . . . ,Z n ) are drawn from the lifetime distribution G 
and a third independent sample U n = (Ui, . . . , U n ), from the standard uni- 
form distribution. Let Y{ = ZiUi, i = 1, . . . , n, and write y n = (Y\, . . . , Y n ). 
Then y n is a random sample from the absolutely continuous distribution F 
with density given by (1.1). The observed data consist of (X m ,y n ) while 
(Z n ,U n ) is unobserved. 

We begin with the score equation derived from the likelihood L{G) given 
by (1.2). Let G m and F n be, respectively, the empirical distribution func- 
tions based on the uncensored observations xi, . . . , x m and the censored cases 

... , y n , and write p = m/k, where k = m + n. For simplicity, assume all 
observations are distinct, and denote by t± < ■ ■ ■ < the values taken by 
xi, . . . , x m and yi, . . . ,y n . The distribution function G satisfies the nonpara- 
metric score equation if, for all t > 0, 

dF n (y) 



(2.1) dG(t)=pdG m (t) + (l-p) 



o<y<t f y < z z 1 dG(z) 



t~ x dG{t), 



while Ylj=i dG{tj) = 1 and dG{tj) > 0, j = 1, . . . , k; see [51], page 1025. In- 
tegrating both sides of (2.1), we obtain 

dF n (y) 



G{t)=pG m {t) + {l-p) [ [ 

J0<x<t Uo 



o<y<x f y < z z 1 dG{z) 



x 



l dG{x), 



where the final integrand is defined to be for x > t^. We say that a sequence 
of real numbers ^ rn ,n satisfies assumption (AO) if 



< oo, 



where the summation is understood to range over subsample sizes m and n, 
jointly taken to infinity, so that p— >pE (0, 1]. To circumvent problems re- 
lated to a singularity at the origin, we select a sequence of positive real num- 
bers 7 m>n satisfying (AO) and consider solutions G of (2.1) assigning zero 
mass below ^jm^n- 

All results derived in this article apply to any solution 
of (2.1) with this property. The existence of such solutions is an important 
fact. 

Claim 1. Suppose that (AO) holds. Then, for each m and n sufficiently 
large, (2.1) has a solution G such that G(u) = for each u < j m ,n- 

If there exists some 70 > such that G(7o) = 0, assumption (AO) is not 
required. We may simply choose 7 mjn = 7o> and because any solution of (2.1) 
will have zero mass below 7 mn , the proposition follows directly from [50]. 
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Define U m , n = Vk(G - G), W x , m = V^(G m - G), W Y , n = V^(F n - F), 
Kt) = J t < z z~ X dG(z) and 

(2.2) W m , n (t) = y/$W x , m (t) + yfi^PfV) [ WyM d 

J0<v<t 



We observe, in particular, that 
(2.3) 



W m , n (t)\ < y^\Wx, m (t)\ + v 7 ! 3 ^ sup \W Y ,n(v)\ 

0<y<t 



for each t > 0. As in [51], we have that 

W m , n (t) =pU m , n (t) + (1 ~p)f(t) 



y 



0<y<t 



dz I d 



The process W m n can therefore be expressed as the image of a linear oper- 
ator applied on \J m ,n- To see this, we define the operator Qm,n 

pointwise as 

(u)(t) = f{t)A m ,n{u){t), where 



A m ,n(u)(t) 



0<y<t 



u(z) 



dz d 



f(y) 



Then, we may write T m<n = pi + (1 — p)G m ,n-, with T{u) = u the identity 
map, and observe that 



(2.4) 



W„ 



J~ m,n (Un 



Denoting by Z?o[0,oo] the space of cadlag functions vanishing at and oo 
endowed with the uniform topology (the topology induced by the supremum 
norm over [0, 

Halloo — svrpo<t<oo |tt(t) | ) j it is not difficult to see that I, 
G m ,n and T m ^ n are bounded linear operators on D$ [0, oo] , and, in view of 
Lemma 3 of [51], that T rr ^ n has a bounded inverse satisfying ||J^ n || < 2/p 2 . 
As in [51], it holds that if p — > p 6 (0, 1] as m, n — >■ oo, then, for each u € 
Dq [0, oo] , we have that 

\\F m ,n(u) -F^Woo -^>0, 
where the limit operators are J- =pX+ (1 — p)Q, Q(u)(t) = f(t)A(u)(t) and 



A(u)(t) 



0<y<t 



u(z) 



dz d 



f(y) 



We may then conclude that Q and T are also bounded linear operators 
on Z?o[0, oo] and that T has a bounded inverse satisfying Hi 7 " 1 )! < 2/p 2 . 
Vardi [51] proved the uniform strong consistency of G using (2.4). Instead, 
we obtain it as a corollary of Lemma 1 below. 
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Of importance will be the fact, proved in [51], that the inverse opera- 
tor J 7-1 has the following pointwise representation: 

/•OO 

(2.5) J" _1 (u)(t) = p~ 1 u(t) + \ $(t,x)u(x)dx 

Jo 

with kernel $ satisfying, for each t and x, the constraints 

/•oo 

(2.6) p 2 $(t,x) + (l-p)Ao(t,x)+p(l-p) / $(t,z)Ao(z,x)dz = 

Jo 

and 

roo poo 

(2.7) / $(t,z)Ao(z,x)dz= / A (t,z)®(z,x)dz, 
Jo Jo 

where we have defined A {t,x) = f(t)x- 2 f Q<y < tAx yd[l/f{y)}. 

As in [51], we have that W m>n -~» W in Z?o[0,oo], where is the Gaussian 
process 

1 



W(t) = JpB x {G{t)) + ^/l^p~f{t) [ B Y {F{y)) d 

J0<y<t 



f(y) 



with Bx and By independent Brownian bridges, and that U m>n ~» £7 = 
J-~ 1 (W) in Dq[0, oo]. Here, the symbol refers to weak convergence. This 
last step can be established using the convergence of T m<n to T in opera- 
tor norm topology, Lemma 3 of [51] and the continuous mapping theorem. 
A consistent estimator ipu(s,t) oii/ju(s,t) = E[U(s)U(t)] is provided in [51], 
though in practice the use of resampling methods may yield an estimator 
of ip(s,t) more expediently. 

3. Approximation of the empirical process t/ m>n . 

3.1. Strong approximation. Let a n denote the empirical process of n 
independent standard uniform random variables. The KMT construction 
implies that there exists a probability space (f^-F, P) with a sequence of 
independent standard uniform random variables and a sequence of Brownian 
bridges B n such that 



\OLr 



Bn\\[0,l]=O\ -j=- \ a.s. 



Equation (2.4) is key to the strong approximation of U m ^ n . Since Wx, m 
and Wy,n are independent empirical processes associated, respectively, 
with X m and y n , in view of the KMT construction, there exist versions 
of Wx, m and Wy,n along with two independent sequences of Brownian bridge 
processes Bx m and By n such that Bx m °G and By n oF approximate W x m 
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and Wy, n at the optimal rate of logs/y^ (here, s is the sample size). Us- 
ing (2.4), we extend this approximation to W m ^ n and use properties of T 
to find a sequence of Gaussian processes strongly uniformly approximat- 
ing Um^n,. The main theorem of this section, Theorem 1, is proved through 
a sequence of lemmas. 

Denote the upper limit of the support of G by r = sup{i : G(t) < 1}. Given 
any set B, denote by Ib and || • \\b the indicator function and the supremum 
norm over B, respectively. Write || • ||oo for the case B = [0, oo). We introduce 
the following assumptions: 

(Al) Vk(p — p) = 0(Vlog log k) for some p £ (0, 1] . 

(A2) G is continuous and has bounded support (r < oo). 

(A3) There exists a>o>2 such that lim x |o G(x)/x a ° < oo. 

(A4) There exists f3 > such that lim^l - G(r - x)]/x@ £ (0, oo). 

We begin by obtaining rates for the difference between G and G as well 
as between / and / in the supremum norm. 

Lemma 1. Suppose (AO) holds. Then, for any sequence of nonnegative 
real numbers a mn , as k — >■ oo : 



(a) \\G-G\U = oUM^) a.,. 



(b) l|/-/||[a m ,„,oo) 



(/ lo^ lo^ k \ 
lm,n\j J, + [Fu{lm,n) ~ %(a m ,n)]I[0, 7m , n )(am,") J 

The above indicates, for example, that in addition to satisfying (AO), 7„ 
should be such that 



7 m ,nV T ► 0. 



-i . /logjogfc 
k 

If (A3) holds, the sequence r )' mn = k~ l ^ 2a ^ may be considered, with the 
choice a £ (l,ao/2) ensuring that the two requirements above are satisfied. 
In this case, choosing a as close as possible to ao/2 would yield the fastest 
rate, modulo logarithmic terms, in part (b) of Lemma 1. We now provide 
a result on the growth rate of maxima of Wiener processes. 

Lemma 2. Let W n be a sequence of standard Wiener processes. Then, 
as n — > oo , 



||w n || f0 ,ii= o{^Ei 



n) a.s. 
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The next result considers the asymptotic behavior of the sequence of in- 
verse operators T^~ n . First, we note that the space Dq[0,t] endowed with 
the uniform topology is a Banach space. As such, stf = C(Dq[0,t], Dq[0,t]), 
the space of bounded linear operators on Do[0,r] endowed with the opera- 
tor norm topology, is a Banach algebra. We recall additionally that cadlag 
functions have countably many jumps (see [36]) and are therefore Riemann 
integrable on bounded intervals. 

Fixing e > 0, set I £ (u)(t) = n(t)I[o jT _ e ] (t) and define T m ^ n , e and T e : Do[0, 
t]^D [0,t] as 



m,n,e 



pT + (1 - p)Gm,n,e and F £ =pl+ (1 -p)G e , 



respectively, where for any t £ [0, r] , 

Gm,nAu)(t) = f(t)(Am,nOT e )(u)(t) and Ge(u)(t) = f(t)(A oZ e )(u)(i). 

Define e = rp 2 /(p 2 -2p + 2). 

Lemma 3. Suppose that (A0)~(A2) hold and that e is in (0,£o). Then, 
considering the operator norm over the space Co[0,r] of continuous functions 
on [0,t] vanishing at the endpoints, as k—^oo, 

1 



m,n,e 



o 



log(l/T 



f(r~e) 
With the choice 7, 



m,nj 



f{lm,n) 



7m,n 



log log k 



k 



+ Fu(lm,n} 



a.s. 



m,n — !m,n 



IJ 7 " 1 -F~ l \\=0 

K m,n,e u e 



7^ n , the order above may be simplified to 

fc -(«-l)/(2a) log fc^/loglogfc S 



f(r~e) 



a.s. 



We now consider a random integral useful in determining the rate of the 
strong approximation we will construct for U m ^ n . 



Lemma 4. Suppose that (A0)-(A2) hold and that e is in (0,£o). Then, 
as k — > oo , 



sup f(s) 

0<s<r-e 



B Y JF(y))d 



1 



O 



Rv) Rv) 

/T^^tog^loglogA;) 1 / 4 



f(r-e) 



a.s. 



Remark 1. The above bound also holds for e — s m n 1 provided e m ,nk/ 
Vlog log k — > oo. 
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Henceforth, we set 7 mjn = n for each m and n. The next lemma estab 
lishes the existence of a sequence of Gaussian processes approximating W mtn 
Define the sequence of processes 

(3.1) W^ n (s) = VpBx,m(G(s)) + y/T^pHs) [ B Y>n (F(y))d\-^ 

Jo<y<s \_j\y) 



Lemma 5. Suppose that (Al)-(A3) hold and that e is in (0,eo). Then, 
setting there exists a probability space on which W m ^ n and n are defined 
such that, as k—too, 

WW W° II flS 
\\W m , n - Vv mjfl ||[o )T _ e ] = °\ fij-e) J 

where r(a) = min(|, ^-). 

The next lemma extends the result on the growth rate of Wiener processes 
in Lemma 2 to the sequence of approximating processes (3.1). 

Lemma 6. Suppose that (A2) holds and that p E (0, 1] . Then, as k — >■ oo, 

\\W^J OD =0(y^k) 

Having established the existence of a sequence n of Gaussian pro- 
cesses approximating W m>n and studied the behavior of J~^~ n , we may pro- 
vide a sequence of Gaussian processes approximating U m>n . Define U^ nn = 
F~ l {W^ a n ) for each m and n. Since J 7 " 1 is a bounded linear operator, n 
forms a sequence of Gaussian processes. 

Theorem 1. Suppose that (Al)-(A4) hold. Then, on the probability 
space on which W mn and n are defined, we have that, as k—^oo, 



\\U m , n ~ U^n || [o,T-e m , n ] = 0{e m , n {\og kf' 2 ^log log A;) a.s., 
where e m ,„ = fc-' r ( a )/^ +1 ) and r{a) = min(±, 

Theorem 1 will be crucial in our study of the asymptotic properties of 
kernel density estimators of g, the density associated to G, in Sections 4 
and 5. Other applications of Theorem 1 include oscillation moduli and laws 
of the iterated logarithm; see [16]. 

3.2. Global modulus of continuity. In order to describe the asymptotic 
properties of the kernel density estimators of g via the above strong approx- 
imation, we must establish the global modulus of continuity of the approxi- 
mating process n . 

In the sequel, we say that the distribution G satisfies assumption (A5) 
if its density g is differentiable, and that a sequence of bandwidths h m n 
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satisfies assumption (Bl) if: 

(1) mh m)n — > oo and log h m ^ n / log log m — > — oo as m,n — > oo; 

(2) \J\ognh rriiTl — > and ^/log mh rniTl — > as m, n — > oo. 

Theorem 2. Suppose that (Al)-(A5) hold, and that the sequence h mi7l 
satisfies (Bl). Then, for any r] in (0, r), we have that, as k— >oo, 

sup sup |*7° (t + s)-U% ' (t)\ =0(Jh m>n log(l/h mtn )) a.s. 

0<t<r-ri0<s<h m ,„ v 

4. Asymptotic behavior of kernel density estimators. Consider the ker- 
nel density estimator g m of a univariate density g introduced by [38], 

(4.1) 9m(t) = ^- HK^-f^ddM, 



hm Jo \ h"m / 

where X±,. . . ,X m are independent observations from g, K is some kernel 
function, h m some bandwidth and G m the empirical distribution function. 
The weak and strong uniform consistency of g m was addressed in [33, 39] 
and [47], among others. To ensure strong uniform consistency, these authors 
required that ^ m exp(— cm/i m 2 ) < oo for each c > 0. Silverman [42] estab- 
lished the strong uniform consistency of g m under weaker assumptions using 
the KMT strong approximation technique. When the observations are sub- 
ject to random right-censoring, Blum and Susarla [9] proposed estimating g 
by the estimator in (4.1), replacing G m by the Kaplan-Meier estimator of G. 
The properties of the resulting estimator were examined in [9, 19] and [32], 
among others. 

To estimate the density function g under multiplicative censoring, we 
consider a sequence of kernel density estimators g m ,n , defined as 

(4.2) g m , n (t) = r kI 1 —^] dG(s), 

where G is, as before, a solution of the nonparametric score equation based 

on (x m ,y n ). 

We introduce an additional set of assumptions to be used in the sequel. 
The sequence of bandwidths h m ^ n is said to satisfy assumption (B2) if 

lim e m ,n (log fc) 3/ V log log fc = 

We say that a kernel function K satisfies assumption (Kl) if: 

(1) K has total variation Vk < oo; 

(2) K is supported on (—1,1); 

(3) K is continuous; 

(4) jK(u)du = l. 

Further, we say that it satisfies assumption (K2) if j uK{u) du = 0. 
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4.1. Strong uniform consistency. Denote by g m ^ n the kernel smoothing 
of g based on G; that is, write 

i>"m,n JO \'*wi ) ri / 

Lemma 7. Suppose that (Al)-(A5) hold, and that h mn is a sequence of 
positive bandwidths tending to as k — )■ oo and satisfying (Bl) and (B2). 
Suppose also that the kernel function K satisfies (Kl). Then, for any rj in 
(0,r), we have that 

lim ||£ mn -5m,n 1 1 [0,r-rjl =° a - s - 

Theorem 3. Suppose that (A1)-(A5) hold, and that h m ^ n is a sequence 
of positive bandwidths tending to as k —> oo and satisfying (Bl) and (B2). 
Suppose also that the kernel function K satisfies (Kl). Then, for any rj in 
(0, t), we have that 

lim ||<? mn -#||r i =0 a.s. 

fc— >oo 1 J 

4.2. Strong uniform approximation of the empirical density process. By 
Theorems 1 and 3, we can find a sequence of Gaussian processes that strongly 
and uniformly approximates the empirical density process. Let K be an 
arbitrary density function, and define 

¥>m,n(M) = T— K \\ " )• 

Denoting by v s [<^ mjn (t, s)] the total variation of (p m ,n(t, •) for fixed £, we refer 
to the uniform total variation sup t v s [(p m ^ n (t, s)] by V m ,n- 

Theorem 4. Suppose that (Al)-(A5) hold, and that h m ^ n is a sequence 
of positive bandwidths tending to as k — > oo and satisfying (Bl) and (B2). 
Suppose also that the kernel function K satisfies (Kl) and (K2), and that g 
has a bounded second derivative. Then, for any rj in (0,r), we have that 

\\ y /k{9m,n — 9) — r'm.n 1 1 [o,r-J7] 

V ""m,n J 

where we have defined T mjTl (t) = J °° U^ i n (s)ip m , n {t,ds) . 

Remark 2. Theorem 4 suggests that the optimal rate for the above 
approximation is obtained by choosing h m , n ~ (^m,n \/^°S logk/k^^^log k. 

Theorem 4 implies distributional results. The linearization ipjj(s — uh,t — 
vh) — ipu( s i t) ~ h is useful here. This result is not difficult to show for p > 1/2 
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using representations of ipu provided on page 1033 of [51], linearization tech- 
niques and the modulus of continuity of process U. The case p<l/2 (i.e., 
heavy censoring) is more challenging, but can be dealt with using (2.6), (2.7) 
and an argument similar to that found in the proof of Theorem 2. Using The- 
orem 4 and the above linearization, we may show that y/kh m ^ n (g m ^ n — g) 
is asymptotically Gaussian with mean zero and covariance function a g esti- 
mated consistently by 

a g (s,t) = h^ l 1 n J J ipu(s - uh m>n ,t - vh m>n ) dK(u) dK(v). 

5. Integrated squared error of kernel density estimators. A common 
measure of the global performance of an estimator g m of a density g is its 
integrated square error (ISE), defined as 

/oo 
[g m (s) -g(s)] 2 ds. 
-oo 

Use of the ISE is particularly pervasive in simulation studies aiming to com- 
pare the performance of various density estimators. Minimization of the 
mean integrated square error (MISE) E[£ m ] = f^ oQ E[g m (s) — g(s)] 2 ds is 
often a guiding principle in the construction of kernel density estimators. 
Steele [44] identified the need to determine the relationship between various 
measures of accuracy in density estimation. One such measure, the order 
of £ m — E(£ m ), is particularly important in statistics. Hall [24] first began 
addressing the issues raised in [44] by computing the exact order of con- 
vergence of £ m — E(£ m ) to zero using the strong approximation technique 
developed by Komlos, Major and Tusnady [27] for the standard empirical 
process. Zhang [56] studied the case of random right-censoring using the 
strong approximation technique of [10] and [11]. In this section, we consider 
the ISE £ m , n of the kernel estimator g m ,n under multiplicative censoring and 
derive its asymptotic expansion. 

5.1. Asymptotic expansion of the integrated squared error. In the remain- 
der of the paper, we make use of the following assumptions. We say that 
the kernel function K satisfies assumption (K3) if it has finite second mo- 
ment <t 2 > and is differentiable. Further, we say that the density g satisfies 
assumption (A6) if it is twice continuously differentiable. Of course, as- 
sumption (A6) implies assumption (A5). Finally, we say that the sequence 
of bandwidths /i m>n satisfies assumption (B3) if 

VlogA;(loglog/c) 1/6 

lim ; TTTTtfTs = 0, 



where S(/3) = 4 + 4/3/(2/3 + 3). In the sequel, we write v for J K 2 (u) du. 
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The ISE of g m ,n on the interval [^1,^2] is defined as 

I-U2 

£m,n(ui,u 2 )= [g m , n (s) - g(s)} 2 ds. 



Theorem 5 presents an asymptotic expansion for E m n (0, r — 77) for any r/ in 
(0,t). 

Theorem 5. Suppose that (A1)-(A4) and (A6) hold with a > 4 in (A3) 
and that a is chosen in [2,ao/2). Suppose that h m! n is a sequence of pos- 
itive bandwidths satisfying (Bl) and (B3), and that the kernel function K 
satisfies (K1)-(K3). Then, for any r] in (0,r), we have that 

£m ^ t - v) = ^f- L v [9 " {s)? ds+ h£k- P +op {-kh + • 

Theorem 5 suggests that h m ^ n should shrink at the rate modulo 
logarithmic terms, where C(P) = max(5, 5(f3)). We note that 5{f3) < 5 when 
/3 < 3/2. Then, writing \\g"\\\ r T 1 = /J" v [g"(s)] 2 ds, Theorem 5 suggests 
that the bandwidth 

minimizes the order of the integrated squared error, a direct generalization 
of the reference rule for uncensored data alone, which we recover for p = 1 
and k = m. Of course, in practice, this bandwidth is unknown; instead, we 
may substitute g" by some estimate g" , and p by p = m/k. For example, 
a reference rule based on a Gamma approximation to G is given by 

./ is 2 

(5-1) K ltn = 2p — j 

where f3 = ^S=i^«/(^ m ) ^ s the MLE of /3 based on X m and the model 
G = Gp, with gp{x) = x 3 exp(— x/ (3)/(Q(3 4 ) the density associated to Gp. 
This distribution satisfies (A3) with «o = 4 but is a limiting case with re- 
spect to the stronger assumption made in Theorem 5. It was selected be- 
cause it has the least smooth density in the family of densities {g a ,/3( x ) = 
x a ~ l exp(— x/ ft)/{r(a)/3 a ] :a > 4} with respect to the Z/2-norm of the sec- 
ond derivative of g a p- Alternatively, we may consider kernel smoothing 
of the uncensored observations alone to obtain a nonparametric pilot es- 
timate g" of g" . More robust but computationally intensive cross-validation 
approaches, as in [29], may also be used for bandwidth selection. 

5.2. Small-sample simulation results: Implementation and efficiency. To 
provide some illustration of the behavior of the methods proposed, we present 
below results from a preliminary small-sample simulation study. The objec- 
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tive was to graphically evaluate the general adequacy of the estimators as 
well as to elucidate the potential contribution of censored observations to 
overall estimation efficiency, both in small samples. For this purpose, we 
considered data emanating from the multiplicative censoring model, with 
underlying Gamma density function g Q (x) = T(a) x a exp(— x)I(o )00 )(x), 
various sample sizes and differing values of parameter a. We found the kernel 
density estimators proposed to perform generally well. Figure 1 presents 100 
sample paths, shown in grey, for various sample sizes and parameter value 
a = 5. Plots in the first column were obtained by discarding all censored 
observations and performing kernel density estimation using the uncensored 
observations alone; all observations were used in generating plots in the sec- 
ond column. The pointwise average of the sample plots is shown in solid 
black, while the true density is the dotted black curve depicted. The first, 
second and third rows were generated from datasets of 100, 200 and 400 
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Table 1 

Average percent increase in ISE and 95% CIs using F(4, /3) parametric reference rule 



Sample size 


a = 3 


a = 4 


a = 5 


a = 6 


50 + 50 


15.2I7.I19.0 


n.3l3.4i5.6 


16.4I8.420.4 


13.gl6.3l8.7 


100 + 100 


16.818.520.2 


14.1I5.817.5 


13.6l5.3l7.0 


9.9II.613.2 


200 + 200 


13.2l4.7ic.l 


n.6l3.1i4.7 


14.417.821.2 


18. 422. 626. 7 



total observations, respectively, with censored and uncensored observations 
equally represented. In all cases, bandwidth values were automatically se- 
lected using the r(4, (3) parametric reference rule (5.1). The Epanechnikov 
kernel K(x) = | (1 — x 2 )I(_i^(x) was used throughout. From these plots, we 
notice that use of the full sample leads to a decrease in variability throughout 
the support. Our empirical findings suggest that this cumulates to a sub- 
stantial decrease in integrated squared error. Table 1 reports estimates and 
associated 95% confidence intervals for the mean relative difference in ISE, 
defined as (ISEo — ISEi)/ISEi, obtained from a simulation of 500 datasets, 
where ISEo and ISEi are the integrated squared errors associated with the 
use of the uncensored subsample and of the full sample, respectively. These 
values describe the mean percent increase in ISE from discarding the cen- 
sored subsample, for various sample sizes and parameter values. 

The relative performance of the estimators was found to be rather in- 
sensitive to the proximity of the underlying distribution to the parametric 
model specified in the reference rule used, with an average increase in ISE of 
around 10-25%, subsequent to discarding censored observations, regardless 
of sample size and parameter value. Since the performance of kernel den- 
sity estimators hinges upon the performance of the underlying estimator of 
the distribution function as well as the adequacy of the bandwidth selection 
rule, gauging the contribution of censored observations to overall estimation 
efficiency is complicated by the layer of uncertainty associated to bandwidth 
selection. As such, we have also conducted a simulation study, whereby, for 
each simulated dataset, the bandwidth selected was that minimizing the ob- 
served ISE; we refer to this rule as the optimal bandwidth selection rule. 
Of course, such a rule can only be adopted in simulation settings, where 
the true density function is known, and the ISE can be computed directly. 
This approach provides, nonetheless, a clearer view of the gains resulting 
from the inclusion of censored observations in the estimation procedure. Ta- 
ble 2 reports estimates of the mean relative increase in ISE resulting from 
discarding all censored observations along with associated 95% confidence 
intervals. These results seem to suggest that for small and moderate sample 
sizes, when equal numbers of censored and uncensored observations are avail- 
able, ignoring censored observations leads to an increase in ISE of roughly 
10-35%, results consistent with those reported in Table 1. 
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Table 2 

Average percent increase in ISE and 95% CIs using optimal bandwidth selection rule 



Sample size 


a = 3 


a = 4 


a = 5 


a = 6 


50 + 50 


9.6l4.3ig.o 


10.9I5.720.5 


9.814. 819. 9 


17.332.547.7 


100 + 100 


12. 7I6. 320.0 


12.9I7.321.6 


ll.ll5.820.5 


12.426.941.3 


200 + 200 


10. ol3. 817.6 


12.1I7.222.4 


14.321.027.7 


16. 934.652. 3 



The above provides a glimpse of the contribution of the censored obser- 
vations in small and moderate samples. It suggests that these observations 
provide nonnegligible information regarding the estimand of interest. We 
may, however, also resort to asymptotic arguments to motivate use of the 
full sample for the sake of efficiency. For any given distribution function H, 
denote the integrated squared error by 

lSE(H,h;g) = J I J K dH(y) - g{y) " dy 

and define the optimal bandwidth X(H; g) as the minimizer of the ISE with 
respect to the true density g, that is, X(H;g) = argmin^ >0 ISE(i7, h;g). 
Let G mj n be any consistent estimator of G based on (X m ,y n ). The opti- 
mal kernel density estimator of g based on G m ^ n is then g k mn = a;(G mj „), 
where u is the operator defined pointwise as 

oj(H)(x) = 1 , / k( , X ~ U , S ] dH{u). 



\{H-g)J \\{H;g)J 

Since any solution G of the nonparametric score equation is asymptotically 
efficient for G (see [51]), it is possible to show, along the lines of Theo- 
rem 25.47 of [46], that g^ n = uj{G) is asymptotically efficient for g = uj{G). 
In particular, the kernel density estimator using the empirical distribution 
function based on uncensored observations alone cannot be expected to be 
asymptotically efficient, given that the latter is itself not efficient for G. It is 
thus clear that, barring additional complications linked to bandwidth selec- 
tion, use of the full sample is preferable to that of the uncensored subsample 
alone. 



6. Length-biased sampling with right-censoring. As discussed in the 
Introduction, the likelihood of length-biased right-censored data is a par- 
ticular case of that exhibited in (1.2). The literature on length-biased sam- 
pling can be traced as far back as [52], with important contributions by 
Fisher [18], Neyman [34] and Zelen [55] in medical applications, and by 
Cox [13] in industrial applications. The rigorous treatment of biased sam- 
pling was initiated in the 1980s by Vardi [48, 49], and furthered by Gill, Vardi 
and Wellner [21], Vardi and Zhang [51], Bickel and Ritov [7], Gilbert [20] 
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and, more recently, by Asgharian, M'Lan and Wolfson [2], Asgharian and 
Wolfson [3] and Bergeron, Asgharian and Wolfson [5]. The importance of 
biased sampling in medical applications and prevalent cohort studies was 
re-emphasized by Cox and Oakes [14]. 

The lifetime data typically collected on a prevalent cohort consist of triples 
(A, RAD, A), where A, R and D are, respectively, the current-age, the resid- 
ual lifetime and the residual censoring time, while A = ^-{r<d} is the censor- 
ing indicator. Suppose that D and (A,R) are independent. In one scenario 
considered in [3], all analyses are carried out conditionally upon the propor- 
tion of uncensored individuals, assumed fixed. As such, the observations are 
comprised of 

(Ai,Ri) f A ,R\A=i, i = l,...,m, 



and 



where f A ,R(a,r) 
associated to 



(Aj,Dj) ~"/a,£>|a=0: 3 = 1, 

= fu( a + r )l and fjj is the probability density function 



•1) 



Fu{t) 



^dG{s) 



o 



~ l dG(s) 



o 



The conditional density functions above are explicitly given by 

l-F D (r) 



R\A=1 



(a,r) 



and 



f A . 



D\A=0 



(a,d) 



p(a + r 
f D (d) 



dG(a + r) 



_1 dG(z) 



(1 ~P) Ja+d<z 

for the uncensored and censored subjects, respectively. Here, fn and Fp 
are, respectively, the density and distribution functions associated to the 
residual censoring random variable D, and p = pr(A = 1) is the proportion of 
uncensored individuals. The full likelihood of m uncensored and n censored 
length-biased observations is thus 



n 



i-F D (n 

pXi 



dG{ Xi ) 



n 



fD(dj 



1-p 



^dGiz) 



Vj< z 



1 dG{z). 



Vj< z 



Denoting G»(t) = P(A + R < t \ A = 1) and F*(t) = P(A + D < t \ A = 0) 
with associated density functions <?*(£) and f*(t), we may verify that 



9*(t) 



pt 



[l-F D (r))dr and /*(t) 



f(t)F D (t) 



1 



P 
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where f(t) is given by (1.1). Defining the operators 
«(«)(*)=/ ^du(x), 

J0<x<t 9\ x ) 

and ^> m ,n = + (1 — p)K m%n , Asgharian and Wolfson [3] have derived, 
under this scenario, the equation ^m,n{U m ,n) = W m ^ n , where W m ^ n is ob- 
tained from (2.2) by replacing Wx,m and Wy, n by the empirical processes 
y/rn(G m — G*) and y/n(F n — F*), respectively. Defining the limiting opera- 
tors 



y(7 

0<y<t \Jy<z z 



(f(t) A My) 
\f(y) J f(y) 



and = p% + (1 — p)IC, one can show that ^* m ,n converges almost surely 
to ^ in operator norm topology, and that is bounded, linear and has 
bounded inverse ¥ -1 if p> 0.59; see [3]. 

As discussed in the Introduction, when the observation mechanism gen- 
erates length-biased samples, it is often of prime interest to make infer- 
ence about Fjj and its density function fjj. Substitution of G by G in (6.1) 
yields Fjj , an asymptotically efficient estimator of Fjj . The asymptotic prop- 
erties of Z m ^ n = \^k(Fu — Fjj) may be studied via its relation to U m>n . Indeed, 
defining L t (x) = x _1 [I[ 0> t](3;) — Fjj(t)], we may write 



Fu(s)-Fu{8) 



J °°L s (x)d[G(x)-G(x)} 
S™x-^dG{x) 



from which we have that Z m>n = J* °° L s (x) dU mjn (x) / f^°x 1 dG(x). Defin- 
ing the operator J£(g)(t) = fiy 1 L t (x) dg(x), we note that if there exists 
some 70 > such that G(7o) = (in which case G is said to satisfy assump- 
tion 7), the operator Jz? is bounded. Consequently, Theorems 1-5 hold when 
making inference about Fjj and its density function fjj. 

Under the additional assumption that the residual censoring distribution 
does not have a point-mass at zero, it is possible to provide an explicit 
distributional result for the empirical density process arising from kernel 
density estimation. Specifically, we have that the empirical density process 
yJkh m)Tl {fu — fjj) is asymptotically Gaussian with mean zero and covariance 
function af v estimated consistently by 



a fu (s,t) = h mn I I ip z (s - uh mjn ,t - vh min )dK(u)dK(v) 
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where ipz is a consistent estimator of the asymptotic covariance function ipz 
associated to the sequence of processes Z m ^ n . For example, we may take 



i>z(s,t)=fi u J J ^u(x,y)dL s (x)dL t (y), 

where fijj = f °° z~ l dG(z), L u (z) = z _1 [I[ 0]M ](z) — F\j[z)\ and ipu is a con- 
sistent estimator of the covariance function vpu of process U. Since for s < t 
we may write tpu(s,t) as 



P(x)dG t (x) / P(x)dG.(x) 
Jo 



\ j [j3{x)Y dG*{x) - 
+ {l-p) j* j* f(x)f(y)L(xAy) 

+ h(x A y) 



f(xVy) f(x/\y)_ 

-h(x)h(y)}d((x)d((y), 



where we have defined ((x) = g{x)[pg^(x)]~ 1 , h(x) = F*{y) d[l/f(y)] and 
e{x) = 2 h(y) d[l/f(y)], consistent estimation of ipu is possible by substi- 
tution of appropriate empirical counterparts into the above. 

Assumption 7 imposed on G may seem restrictive, but nonetheless holds 
in many industrial and medical applications. The case of survival with de- 
mentia, studied in [2] and [53], is an example of such. It is possible to relax 
this requirement by imposing that G and Fd vanish at zero at a super- 
polynomial rate, that is, by assuming that G(t) and Fjj(t) are o(t r ) as t — > 
for each r > 0. While preserving all results pertaining to G, this relaxation 
does not directly preserve those pertaining to Fjj. The unboundedness of Jzf 
is problematic, although an application of Tikhonov's regularization method 
may help in circumventing this problem. This has been explored by Carroll, 
Rooij and Ruymgaart [12], although not from the perspective of strong ap- 
proximations. 



7. Closing remarks. (1) For distributions with a lighter left tail (ao > 2) 
and heavier right tail (small /?), the rate obtained for the strong approx- 
imation of U m n is close to fc _1//4 modulo logarithmic terms. It is unclear 
whether it is possible to achieve better rates; if so, different techniques 
would necessarily be needed to control the rate of Z5 in Lemma 4, as the 
best achievable rate for X5 using approximations by Bernstein polynomials is 
A; -1 / 4 . As for assumption (B2) on the bandwidth required to establish The- 
orem 3, the A; -1 / 4 rate in the strong approximation roughly translates into 
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the bandwidth condition (log k) 2 /(k 3 ^h m ^ n ) — > when we further replace 
the iterated logarithmic term by a logarithmic term. This is in contrast to 
log k/(kh m)n ) — > obtained in [42], in the case of uncensored observations 
alone. Likewise, the rate given in Remark 2, after Theorem 4, is roughly 

Vn~(log£) 2 / 3 A 1/4 - 

(2) The theory presented in this paper requires that p — )• p G (0,1]. The 
case p = may itself be of interest. On one hand, if p = for each k, then all 
observations are multiplicatively censored; this has been studied by Groene- 
boom [23], among others. On the other hand, if p > for each k, the methods 
developed in this paper may be adapted as long as p does not vanish too 
rapidly. Specifically, we may redefine T m ^ n =pl+ (1 — p)Q and 

<,„(«) = VpB x ,m(G(s)) + y/T=3f(s) [ B Yyn (F(y)) d 

J0<y<s 

Suppose that p~ 2 is 0(vk) for some sequence of positive real numbers 
tending to infinity. Then the strong approximation holds, with n rede- 
fined as the Gaussian process FmrSWmn) an< ^ the rates being multiplied 
by 0(vl). Further, the rate of the global modulus of continuity of U^ %n is 
multiplied by 0(vk)- This allows one to study the case p = 0. This extension 
provides insight into the leap between the square-root asymptotics in the 
canonical multiplicative censoring setting and the cube-root asymptotics for 
the Grenander estimator when only censored observations are available. 

APPENDIX: PROOFS OF MAIN RESULTS 



1 



Proof of Claim 1. If the condition Emn^(7m>i) < oo is satisfied, 
it is an immediate consequence of Theorem 1 of Section 10.1 of [41] that 
pr(min(Xi, . . . ,X m ) < 7 m ,n i.o.) = 0. Hence, almost surely, we may find mo 
and no G N such that, for each m > tuq and n > no, all uncensored obser- 
vations xi, . . . ,x m are no smaller than j m n . We restrict our attention here 
to such sufficiently large m and n. Define 5i = I{ Xl ,...,x m }(^) f° r i = 1, ■ ■ ■ ,k, 
and write ro = min{z : ti > 7 m ,n}- By construction, we must have that 5\ = 
■ ■ ■ = <5 ro _i = 0. Define the set 

f k ll 

@= \ ( a r ,a ro+ i, ...,Ofc):0< a ro ,a ro+ i,. . . , < 1, } u dj = 1, afc > - >, 

L i=r ) 

a bounded, closed and convex subset of M fc_ro+1 . For i = ro, . . . , k, define 



p\ a t f l-p\ ^ 1-6. 



l (a ro ,...,a k )-6 i l-\+-y — j^— j ; 



3 



j=l Sg=max(j,r () ) a q/^q 




* l j=l J2q=max{j,r ) a q/tq , 
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and cj) = (cj) ro , . . . , 4>k) ■ We note that (f> is continuous on $>. We want to 
show that C 3>. The fact that the image of Ql under 0, is contained 

in [0, 1] for i = ro, . . . , k is clear. That it is contained in [1/k, 1] for i = k 
is obvious if 5 k = 1. We assume instead that 8 k = 0. Then, defining A,- = 
Sg=maxO> ) a q/ t q > for j = 1, . . . , A; - 1 and \ k = a k /t k > 0, we observe 
that 




/<j=max(j',r ) 1 



from which it follows that the image of S> under (f) k is contained in [1/k, 1] 
if 5 k = as well. Finally, we require the equality ^2 i=ro 4>i{a rQ , . . . , a k ) = 1 to 
hold for any (a ro , . . . , a k ) £ & ■ This can be verified using that 

k i ro — lk k k 

i=ro 3=1 3=1 *=n> j=ro »=j 

for any array 6^, where under the first sum on the right-hand side, it holds 
that max(j, ro) = ro, while under the second sum, max(j, ro) = j. We may 
thus use the Brouwer fixed point theorem (see, e.g., Proposition 2.6 on 
page 52 and Problem 6.7e on page 254 of [54]) to obtain that there exists 
some a* = (a* , . . . ,a* k ) £ @ such that 4>(a*) = a*. The distribution function 

k 

G*(i) = ^o*I [0 , t] (i i ) 

i=ro 

is a solution to equation (2.1) with zero mass below 7 m>n . □ 

Proof of Theorem 1. Using Lemma 1 and the boundedness of F^ n , 
we have for each t E [0, r — e] that 



U m = ?m]n,e(W m ,n){t) + 0{ey/\og\ogk) a.S. 

Similarly, using the definition of n , n , Lemma 6 and the boundedness 
of J-~ 1 , we have for each t € [0, r — e] that 



= F7\W^ n ){i) + O{e^ogk) a.s. 
The result follows from Lemmas 3, 5 and 6 and the inequality 

\\U m ,n — ^m,nll[0,r-e] = 11-^" mln{Wm,n) ~ J 7 1 (^m,n) II [0,r-e] 

^11-^" rn.n^llllWm.n ~ ^m,n II [0,r-e] 

I 1 1 X — 1 T — 1 1 1 1 1 1 1 

~r IK m,n,e II II m,n II [0,T— e] 



+ 0(6^10^) a.s. 
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\\U m ,n — ^m,n II [0,-r-e] ^ ^ 
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Ar r ( Q ) Vlogfc(log log A;) 1 / 4 



+ o 



f(r-e) 

k -(a-l)/(2a) log fc^/loglogfc 



O(Vlogk) 



f(r-e) 
+ O(e-v^ogfc) a.s. 

The use of Lemma 3 was justified by the fact that n is almost surely 
continuous. Since (A4) implies that /(r — u) ~ for u small, the above 
bound has least order, modulo logarithmic terms, for e = e m ,n- D 

Proof of Theorem 2. Let £ G [0, r — 77] and s G [0, /i m , n ]- By defini- 
tion (3.1), linearity of J-^ 1 and the triangle inequality, we have that 

(A.l) \Ul, n {t + s) - Ul >n (t)\ < I m (s,t) + J n (s,t), 

where we define 

I m (s,t) = \jr\B Xtm oG)(t + s)- T-\B Xtm o G)(t)\, 

J n (s,t) = \T-\n n )(t + s)- ^" 1 (^„)(i)| 

and 

1 



H n (t) = f{t) 



B Y , n (F(y))d 



0<y<t 



f(y) 



We first study I m (s,t). Writing ?(«)(•) = J Q K(-,x)u(x)dx and noting 
that Ao(-,x)u(x) dx = Q{u){-) for each u, equations (2.6) and (2.7) im- 
ply that s(u) = -(1 - p)Q(u + pq(u))/p 2 . It follows from (A5) that Mi = 
II/'II[o,t] < 00 • We find that 

\Q(w){t + s)-G(w){s)\ 



<\f(t + s)-f(t)\ 
+ \f(t)\ 



0<y<t 



w(z) 



y 



>t<y<t+s 

<\f(t + s)-f(t)\ 



y<z z 

w(z) 



dz I d 



1 



y<z z ' 
1 



+ \f(t + s)\ 



1 



fit) /(0) 
1 



dz d 



\w\ 



f(y) 



[0,T] 



\f(t + s)-f(t)\ 



f(t + s) f(t) 

/(o)-/(t) 



|W||[0,r 
1 



+ 



L /(<>)/(*) f(t) 



w 



[0,T] 



< — -Ww\\[o, 
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<y^)lkll[o,r]l/(i + «)-/(i)l 
2Mis 

from which it follows, using (2.5), that 

. , » > , , w m 2(1 — p)Mis „ .... 
\s(u)(t + s)-<;(u)(t)\< \ f{ _ \ l|u+pg(u)||[ ,T] 

(A.2) 

2(1 -p)Mis, 1,,.,, 

* (2+pr 

Using (2.5) once more, we then have that 
sup sup I m (s,t) 

0<t<T — Tj 0<S<hm,n 

<p~ x sup sup \Bx,m(G(t + 8))-B x , m (G(t))\ 

0<*<r-r/0<s</i m ,„ 

+ sup sup \s(Bx l m G)(t + s)-s(Bx ) m°G)(t)\- 

0<t<T—rj 0<s<h m , n 

Using (A5), we may show, as in [31] and [41], that 



sup sup \W x ,m{x + y) -Wx,m(x)\ = O (J h m ,n^og{l / h m , n )) 

0<x<a T 0<y<M h m ,n 

almost surely, where a T = G(t — rj), Mo = ||g||[o, T ]) an d Wx, m is the Wiener 
process associated with Bx, m ', see Lemma 1.4.1 of [15]. Hence, by an appli- 
cation of the MVT, Bx m ° G has modulus of continuity 



C(y // /lm,nlog(l//i m ,n)) 



as well. In view of (A.2) and the fact that ||-Bx,m ° G||[o,t] i s 0{\J\ogm) 
almost surely, we have that 



sup sup \s{B x , m oG)(t + s)- s(B x , m ° G)(t)\ = G{yf\ogmh m ,n) 

0<t<T-T]0<S<h mtn 

almost surely. It follows from the discussion above then that 



(A.3) sup sup I m (s,t) = 0{Jh m ^ n \og{l/h m)n )) 

0<t<T-T]0<S<hm, n 

We now turn to J n (s,t). Defining 



J^s,t) = \f(t + s)-f(t)\ / \B Y>n (F(y))\d 

J0<y<t 
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and 

1 



4(s,t) = \f(t + s)\ [ \B Y>n (F(y))\d 

Jt<V<t+S 



f(y) 



>t<y<t+s 

we notice that \H n (t + s) - U n (t)\ < J' n {s,t) + J"(M)- Using the MVT, we 
have that 

J' n {s,t)< MlS sup \B Y) n{y)\ 
f{ T -V) 0<y<l 

and 

J n {a,t)< jt- sup |fly (y)|<_— sup |fiy,„(y)|, 

•/ W 0<y<l 0<y<l 

so that sup < t < T _ 7? sup < s < ftm n J' n (s,t), sup < t < T _ 7? sup < s < ftro n J"(M ) and 
consequently sup < t < r _ r? sup < s < hm n \U n {t + s) - U n (t)\ are 0(yJ\ognh m>n ) 
almost surely. Further, using (A. 2), we have that 
sup sup \s(H n )(t + s) - s(Hn)(t)\ 

0<t<T—rj 0<s<h m , n 

^ P ^ Ml \ i 2 + P- ll-? r ~ 1 ||)ll'^n||fo,Tl^m ) n = 0{sf\ognh m , n ) a.s. 

P f{T-r]) 

so that sup < t < T _ J? sup < s <^ ro n J n (s,t) = 0(y/lognh m>n ) almost surely us- 
ing (2.5). The theorem follows in view of this last result, (A.l) and (A. 3). 

□ 

Proof of Theorem 3. By the continuity (and hence uniform continu- 
ity) of g on [0,t], the dominated convergence theorem may be used to show 
that 

(A.4) lim sup \g m , n (s) - g(s)\ = 0. 

k->oo o<s< T -r) 

The theorem follows immediately from Lemma 7 and the triangle inequality. 

□ 

Proof of Theorem 4. By Theorem 1 and integration by parts, for 
any t £ [0, r — rj] , we may write that 

g m ,n{t) - g(t) = [g m , n {t) - g m ,n(t)] + [g m ,n(t) - g(t)] 

1 r°° 
1 ' V 



= VkJo Um ' n ^ *^ m ' n ^' ^ 

. , / Ki,n£m,n (log fc) 3/2 \/log log \ 

+ h o m>n 1 a.s., 

where <5 mjn = sup 0<i<r „ J) |5m, n (i) — <?(i)|. The result follows from [37], which 
shows that S m n = 0(h 2 ) and V m n = 0(l/h m , n )- □ 
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Proof of Theorem 5. Since g is twice continuously differentiable on 
[0,r - rj\, we may write that g m ,„(s) - g(s) = h 2 mn a 2 g"(s)/2 + o{h 2 mn ) uni- 
formly in s E [0, r — Combining this expansion with (S.l) in the proof of 
Lemma 7 (see supplementary material [1]) yields 



9m,n( S ) ~ 9(s) 



h 2 o- 5 



9"{s) + 



+ o 



2 7 " Vkh m ^ n 

e m , n (logA:) 3 /Vloglogfc 

\khjn n 



+ o(h 2 ) a.s. 



uniformly in s G [0, r — 77], where T m ^ n (s,h) = J^U^^s — uh) dK(u). In 
view of (2.5) and the proof of Theorem 2, we find that 

T m ,n(s,/i m , n ) =p" 1/2 y B x ,m(G(s-uh mtn ))dK(u) + 0(^\ogkh mtn ) a.s. 
Further, using (B3) we may show, for a > 2, that 

£m,n(logfc) 3/ Vloglogfc _ (h 2 \ 

~~ °\ n m,n) 



and therefore that 

Sm,n( S ) - = 



/l 2 (J 2 



,„ , J^i Bx,m(G(s - uh)) dK(u) 



2 / Vpkh m ,n 
almost surely. It then follows that £ m;n (0, r — 77) may be written as 

hrr, „CT f ^ f/ Q , vPm,n(h m Tl ) <7 h mnr ,Q mn (H mn ) 

\9 {s)\ ds + 



rr—T) 



+ 

+ °( / 4,n) jo(/4,n) 

where we have defined 



pk 

2jy-Rm,n(^ , m,n) 







pkh mn 



a.s. , 



and 



r)Rm,n \h) 



T—rj 



1 2 



B x ,m(G(s-uh))dK{u) 



ds, 



T—rj 



9"(s) 



T—rj 



B x , m {G{s-uh))dK{u) 



ds 



B x ^{G(s-uh))dK{u) 



ds. 
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It follows from [24] that n P m ,n(h) = h m)n v 2 + Op(h m>n ), while nQm,n{h) and 
rjRm,n(h) are both o p (^/h rn ^ n ). We therefore obtain that £ mjn (0,r — 77) may 
be expressed as 

The result follows upon noticing that a term of order o p (hm% / 'V~k) is domi- 
nated by any term of order o p (/i^ n ). □ 
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SUPPLEMENTARY MATERIAL 

Additional technical details: Proof of lemmas 

(DOI: 10.1214/11-AOS954SUPP; .pdf). The proof of each lemma in the 
paper is provided in the supplementary material. 
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